Goto

Collaborating Authors

 model development


Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

Neural Information Processing Systems

Machine learning (ML) models can underperform on certain population groups due to choices made during model development and bias inherent in the data. We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development. We quantify aleatoric discrimination by determining the performance limits of a model under fairness constraints, assuming perfect knowledge of the data distribution. We demonstrate how to characterize aleatoric discrimination by applying Blackwell's results on comparing statistical experiments. We then quantify epistemic discrimination as the gap between a model's accuracy when fairness constraints are applied and the limit posed by aleatoric discrimination. We apply this approach to benchmark existing fairness interventions and investigate fairness risks in data with missing values. Our results indicate that state-of-the-art fairness interventions are effective at removing epistemic discrimination on standard (overused) tabular datasets. However, when data has missing values, there is still significant room for improvement in handling aleatoric discrimination.


Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Strassenburg, Nils, Glavic, Boris, Rabl, Tilmann

arXiv.org Artificial Intelligence

Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training examples and can be utilized by users without expertise in model development. However, this comes at the cost of substantially higher resource and energy consumption compared to smaller models, which often achieve similar predictive performance for simple tasks. In this paper, we present our vision for just-in-time model replacement (JITR), where, upon identifying a recurring task in calls to an LLM, the model is replaced transparently with a cheaper alternative that performs well for this specific task. JITR retains the ease of use and low development effort of LLMs, while saving significant cost and energy. We discuss the main challenges in realizing our vision regarding the identification of recurring tasks and the creation of a custom model. Specifically, we argue that model search and transfer learning will play a crucial role in JITR to efficiently identify and fine-tune models for a recurring task. Using our JITR prototype Poodle, we achieve significant savings for exemplary tasks.



SmartMLOps Studio: Design of an LLM-Integrated IDE with Automated MLOps Pipelines for Model Development and Monitoring

Jin, Jiawei, Su, Yingxin, Zhu, Xiaotong

arXiv.org Artificial Intelligence

The rapid expansion of artificial intelligence and machine learning (ML) applications has intensified the demand for integrated environments that unify model development, deployment, and monitoring. Traditional Integrated Development Environments (IDEs) focus primarily on code authoring, lacking intelligent support for the full ML lifecycle, while existing MLOps platforms remain detached from the coding workflow. To address this gap, this study proposes the design of an LLM-Integrated IDE with automated MLOps pipelines that enables continuous model development and monitoring within a single environment. The proposed system embeds a Large Language Model (LLM) assistant capable of code generation, debugging recommendation, and automatic pipeline configuration. The backend incorporates automated data validation, feature storage, drift detection, retraining triggers, and CI/CD deployment orchestration. This framework was implemented in a prototype named SmartMLOps Studio and evaluated using classification and forecasting tasks on the UCI Adult and M5 datasets. Experimental results demonstrate that SmartMLOps Studio reduces pipeline configuration time by 61%, improves experiment reproducibility by 45%, and increases drift detection accuracy by 14% compared to traditional workflows. By bridging intelligent code assistance and automated operational pipelines, this research establishes a novel paradigm for AI engineering--transforming the IDE from a static coding tool into a dynamic, lifecycle-aware intelligent platform for scalable and efficient model development.


mAIstro: an open-source multi-agentic system for automated end-to-end development of radiomics and deep learning models for medical imaging

Tzanis, Eleftherios, Klontzas, Michail E.

arXiv.org Artificial Intelligence

Agentic systems built on large language models (LLMs) offer promising capabilities for automating complex workflows in healthcare AI. We introduce mAIstro, an open-source, autonomous multi-agentic framework for end-to-end development and deployment of medical AI models. The system orchestrates exploratory data analysis, radiomic feature extraction, image segmentation, classification, and regression through a natural language interface, requiring no coding from the user. Built on a modular architecture, mAIstro supports both open- and closed-source LLMs, and was evaluated using a large and diverse set of prompts across 16 open-source datasets, covering a wide range of imaging modalities, anatomical regions, and data types. The agents successfully executed all tasks, producing interpretable outputs and validated models. This work presents the first agentic framework capable of unifying data analysis, AI model development, and inference across varied healthcare applications, offering a reproducible and extensible foundation for clinical and research AI integration. The code is available at: https://github.com/eltzanis/mAIstro



WatchAnxiety: A Transfer Learning Approach for State Anxiety Prediction from Smartwatch Data

Ahmed, Md Sabbir, French, Noah, Rucker, Mark, Wang, Zhiyuan, Myers-Brower, Taylor, Petz, Kaitlyn, Boukhechba, Mehdi, Teachman, Bethany A., Barnes, Laura E.

arXiv.org Artificial Intelligence

Social anxiety is a common mental health condition linked to significant challenges in academic, social, and occupational functioning. A core feature is elevated momentary (state) anxiety in social situations, yet little prior work has measured or predicted fluctuations in this anxiety throughout the day. Capturing these intra-day dynamics is critical for designing real-time, personalized interventions such as Just-In-Time Adaptive Interventions (JITAIs). To address this gap, we conducted a study with socially anxious college students (N=91; 72 after exclusions) using our custom smartwatch-based system over an average of 9.03 days (SD = 2.95). Participants received seven ecological momentary assessments (EMAs) per day to report state anxiety. We developed a base model on over 10,000 days of external heart rate data, transferred its representations to our dataset, and fine-tuned it to generate probabilistic predictions. These were combined with trait-level measures in a meta-learner. Our pipeline achieved 60.4% balanced accuracy in state anxiety detection in our dataset. To evaluate generalizability, we applied the training approach to a separate hold-out set from the TILES-18 dataset-the same dataset used for pretraining. On 10,095 once-daily EMAs, our method achieved 59.1% balanced accuracy, outperforming prior work by at least 7%.


MAIA: A Collaborative Medical AI Platform for Integrated Healthcare Innovation

Bendazzoli, Simone, Persson, Sanna, Astaraki, Mehdi, Pettersson, Sebastian, Grozman, Vitali, Moreno, Rodrigo

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) integration in healthcare has emerged as a transfor-mative force, promising to revolutionize patient care, optimize resource allocation, and enhance clinical decision-making [2, 10]. As the healthcare ecosystem increasingly recognizes the importance of AI-powered tools, there is a growing need for collaborative platforms to facilitate the development, deployment, and management of AI solutions in medical settings [7, 13]. Modern healthcare institutions are facing complex challenges that demand sophisticated technological solutions. A comprehensive Medical AI Platform can serve as a powerful foundation for addressing these complex needs, effectively bridging technological capabilities with clinical requirements. One of the open challenges in healthcare is the management of the vast amounts of data handled in clinical settings. Cloud-based medical AI platforms can provide new opportunities for computational resource sharing, enabling institutions to optimize data storage, and collaborative research environments. By creating a unified and standardised ecosystem, these platforms break down traditional institutional barriers, facilitating knowledge exchange between medical professionals, data scientists, and researchers.


From Data-Driven to Purpose-Driven Artificial Intelligence: Systems Thinking for Data-Analytic Automation of Patient Care

Anadria, Daniel, Dobbe, Roel, Giachanou, Anastasia, Kuiper, Ruurd, Bartels, Richard, van Amsterdam, Wouter, de Troya, Íñigo Martínez de Rituerto, Zürcher, Carmen, Oberski, Daniel

arXiv.org Artificial Intelligence

In this work, we reflect on the data-driven modeling paradigm that is gaining ground in AI-driven automation of patient care. We argue that the repurposing of existing real-world patient datasets for machine learning may not always represent an optimal approach to model development as it could lead to undesirable outcomes in patient care. We reflect on the history of data analysis to explain how the data-driven paradigm rose to popularity, and we envision ways in which systems thinking and clinical domain theory could complement the existing model development approaches in reaching human-centric outcomes. We call for a purpose-driven machine learning paradigm that is grounded in clinical theory and the sociotechnical realities of real-world operational contexts. We argue that understanding the utility of existing patient datasets requires looking in two directions: upstream towards the data generation, and downstream towards the automation objectives. This purpose-driven perspective to AI system development opens up new methodological opportunities and holds promise for AI automation of patient care.


Role and Use of Race in AI/ML Models Related to Health

Were, Martin C., Li, Ang, Malin, Bradley A., Yin, Zhijun, Coco, Joseph R., Collins, Benjamin X., Clayton, Ellen Wright, Novak, Laurie L., Hendricks-Sturrup, Rachele, Oluyomi, Abiodun, Anders, Shilo, Yan, Chao

arXiv.org Artificial Intelligence

The role and use of race within health - related artificial intelligence and machine learning (AI/ML) models has sparked increasing attention and controversy. Despite the complexity and breadth of related issues, a robust and holistic framework to guide stakeholders in their examination and resolution remains lacking . This perspective provides a broad - based, systematic, and cross - cutting landscape analysis of race - related challenges, structured around the AI/ML lifecycle and framed through " p oints to c onsider " to support inquiry and decision - making. INTRODUCTION The role and use of the social construct of race within health - related artificial intelligence and machine learning (AI/ML) models has become a subject of increased attention and controversy. As noted in the National Academies recent report " Ending Unequal Treatment ", it is increasingly clear that race in all its complexity is a powerful predictor of unequal treatment and health care outcomes.